Partial dependence for multiclass #1554

freddyaboulton · 2020-12-14T19:37:11Z

Pull Request Description

Example on wine dataset

After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of docs/source/release_notes.rst to include this pull request by adding :pr:123.

codecov · 2020-12-14T19:46:48Z

Codecov Report

Merging #1554 (39c1cc4) into main (e224159) will increase coverage by 0.1%.
The diff coverage is 100.0%.

@@            Coverage Diff            @@
##             main    #1554     +/-   ##
=========================================
+ Coverage   100.0%   100.0%   +0.1%     
=========================================
  Files         236      236             
  Lines       16877    16933     +56     
=========================================
+ Hits        16869    16925     +56     
  Misses          8        8

Impacted Files	Coverage Δ
evalml/model_understanding/graphs.py	`99.8% <100.0%> (+0.1%)`	⬆️
...lml/tests/model_understanding_tests/test_graphs.py	`100.0% <100.0%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e224159...39c1cc4. Read the comment docs.

ParthivNaresh

@freddyaboulton Everything looks great!

ParthivNaresh · 2020-12-14T23:04:33Z

evalml/model_understanding/graphs.py

+            fig.add_trace(_go.Scatter(x=part_dep.loc[part_dep.class_label == label, 'feature_values'],
+                                      y=part_dep.loc[part_dep.class_label == label, 'partial_dependence'],
+                                      line=dict(width=3)),
+                          row=1, col=i + 1)


@freddyaboulton Great! Also if the dataset has a larger number of labels, it might be difficult for the user to see all the partial dependency plots in one row.
Maybe:
_subplots.make_subplots(rows=(len(class_labels)+1) // 2, cols=2 ...etc)
and then in line 523:
row=(i+2) // 2 and col=(i%2) + 1.
Not really a big deal either way

Sounds good! I think the only thing missing was that we should only use two columns in the case where class_label=None or else there would be an empty second column in the plot.

ParthivNaresh · 2020-12-14T23:21:31Z

evalml/model_understanding/graphs.py


+    data = pd.DataFrame({"feature_values": np.tile(values[0], avg_pred.shape[0]),
+                         "partial_dependence": np.concatenate([pred for pred in avg_pred])})


angela97lin

Left two nit-picky things, but otherwise LGTM!! This is great stuff 😁

angela97lin · 2020-12-16T16:30:16Z

evalml/model_understanding/graphs.py

+    data = pd.DataFrame({"feature_values": np.tile(values[0], avg_pred.shape[0]),
+                         "partial_dependence": np.concatenate([pred for pred in avg_pred])})
+    if classes is not None:
+        data['class_label'] = np.repeat(classes, len(values[0]))


Nit-pick: Since we're changing the output to return this new field in the DF, could be good to update this docstring too?

Good suggestion! Done!

angela97lin · 2020-12-16T16:31:14Z

evalml/model_understanding/graphs.py

@@ -476,6 +486,10 @@ def graph_partial_dependence(pipeline, X, feature, grid_resolution=100):
        feature (int, string): The target feature for which to create the partial dependence plot for.
            If feature is an int, it must be the index of the feature to use.
            If feature is a string, it must be a valid column name in X.
+        class_label (string, None): Name of class to plot for multiclass problems. If None, will plot


Alternatively:

class_label (string, optional): Name of class to plot for multiclass problems. If None, will plot...; Defaults to None.

…ial_dependence.

angela97lin · 2020-12-16T17:07:24Z

Looks great locally too; just one nit-picky suggestion, if it's not too difficult: is it possible to update the trace names 🤔

freddyaboulton · 2020-12-16T17:19:15Z

@angela97lin good catch! I just pushed this up and updated the tests for multiclass/not multiclass.

bchen1116

LGTM! One thing I think would be cool to have is to allow users to pass in a list of classes to create these plots for in multiclass scenarios. For instance, I, as a user, decided to create a multiclass pipeline with 20 possible target classes, I might want to plot partial dependencies for a subset of the classes only, rather than for 1 or all. Certainly not blocking, but wanted to bring that suggestion up for possible discussion.

bchen1116 · 2020-12-16T19:34:36Z

evalml/model_understanding/graphs.py

+                                      y=part_dep.loc[part_dep.class_label == label, 'partial_dependence'],
+                                      line=dict(width=3),
+                                      name=label),
+                          row=(i + 2) // 2, col=(i % 2) + 1)


freddyaboulton · 2020-12-16T20:06:05Z

Great suggestion @bchen1116 ! I'm on board but let's continue the discussion to #1565 since this issue/PR tracks returning the partial dependence for all the classes as opposed to just the first one in multiclass problems.

freddyaboulton marked this pull request as ready for review December 14, 2020 20:07

freddyaboulton requested a review from dsherry December 14, 2020 20:08

freddyaboulton self-assigned this Dec 14, 2020

freddyaboulton requested review from bchen1116, angela97lin, ParthivNaresh, christopherbunn, eccabay and jeremyliweishih December 14, 2020 20:08

ParthivNaresh approved these changes Dec 14, 2020

View reviewed changes

freddyaboulton force-pushed the 1404-partial-dependence-multiclass branch from f029bcb to 35be5f4 Compare December 15, 2020 15:55

angela97lin approved these changes Dec 16, 2020

View reviewed changes

freddyaboulton added 6 commits December 16, 2020 12:06

Returning partial dependence for each class (no test yet)

9988011

Adding multiclass support for partial dependence.

580d008

Added PR 1554 to release notes.

b9cf0c6

Laying out the partial dependence plots in multiple rows for multiclass.

0ec7c9b

Adding whitespace around operators.

d2d11c9

Fixing typos/updating docstrings in partial_dependence and graph_part…

03fd463

…ial_dependence.

freddyaboulton force-pushed the 1404-partial-dependence-multiclass branch from e17d49e to 03fd463 Compare December 16, 2020 17:06

Adding class label as trace.

39c1cc4

bchen1116 approved these changes Dec 16, 2020

View reviewed changes

freddyaboulton mentioned this pull request Dec 16, 2020

Should users be able to plot partial dependence for any subset of classes in multiclass problems? #1565

Open

freddyaboulton merged commit 7e21b20 into main Dec 16, 2020

freddyaboulton deleted the 1404-partial-dependence-multiclass branch December 16, 2020 21:49

dsherry mentioned this pull request Dec 29, 2020

Release v0.17.0 #1623

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Partial dependence for multiclass #1554

Partial dependence for multiclass #1554

freddyaboulton commented Dec 14, 2020 •

edited

codecov bot commented Dec 14, 2020 •

edited

ParthivNaresh left a comment

ParthivNaresh Dec 14, 2020

freddyaboulton Dec 15, 2020

ParthivNaresh Dec 15, 2020

ParthivNaresh Dec 14, 2020

angela97lin left a comment

angela97lin Dec 16, 2020

freddyaboulton Dec 16, 2020 •

edited

angela97lin Dec 16, 2020

freddyaboulton Dec 16, 2020

angela97lin commented Dec 16, 2020

freddyaboulton commented Dec 16, 2020

bchen1116 left a comment

bchen1116 Dec 16, 2020

freddyaboulton commented Dec 16, 2020


		data = pd.DataFrame({"feature_values": np.tile(values[0], avg_pred.shape[0]),
		"partial_dependence": np.concatenate([pred for pred in avg_pred])})

Partial dependence for multiclass #1554

Partial dependence for multiclass #1554

Conversation

freddyaboulton commented Dec 14, 2020 • edited

Pull Request Description

Example on wine dataset

codecov bot commented Dec 14, 2020 • edited

Codecov Report

ParthivNaresh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

angela97lin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

freddyaboulton Dec 16, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

angela97lin commented Dec 16, 2020

freddyaboulton commented Dec 16, 2020

bchen1116 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

freddyaboulton commented Dec 16, 2020

freddyaboulton commented Dec 14, 2020 •

edited

codecov bot commented Dec 14, 2020 •

edited

freddyaboulton Dec 16, 2020 •

edited